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Abstract 

We study the evolution of a finite size population formed by mutationally isolated lineages of error-prone replicators in a 
two-peak fitness landscape. Computer simulations are performed to gain a stochastic description of the system dynamics. 
More specifically, for different population sizes, we compute the probability of each lineage being selected in terms of their 
mutation rates and the amplification factors of the fittest phenotypes. We interpret the results as the compromise between 
the characteristic time a lineage takes to reach its fittest phenotype by crossing the neutral valley and the selective value of 
the sequences that form the lineages. A main conclusion is drawn: for finite population sizes, the survival probability of the 
lineage that arrives first to the fittest phenotype rises significantly. 
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Introduction 

The quasispecies model is a paradigm of the evolution of self- 
replicative sequences [1-3]. It assumes a population of error-prone 
replicators that evolves under the selective pressure caused by their 
competition under any constraint, e.g. constant population. 
Although this model has been mainly developed in the determin- 
istic limit, i.e. under the assumption of infinite size population and 
frxed environmental conditions, the relevance of fluctuations in its 
dynamics was already stressed in Eigen's seminal paper [4] . Given 
a population of error-prone self-replicative sequences of v 
binary digits, the total number of different sequences that can be 
formed is 2". Therefore, the ratio p = N2^^' provides the 
probability of finding a particular sequence in a neutral landscape. 
This number is extremely low even for very large population sizes 
(e.g. if v=100 and iV=10', then pxlO^^^). Fitness differences, 
together with initial conditions, make some sequences more 
frequent than others. Precisely, the fittest sequences, in the event 
that they exist, are the target of evolution by natural selection. 
Although the influence of the mutation rate in this evolutionary 
process has been widely studied [5-8] , less attention has been paid 
to the relation of the mutation rate and the evolutionary time [9- 
11]- 

In the simplest non-neutral fitness landscape, it is assumed that 
all sequences except one, the master or fittest sequence, have equal 
fitness. If initially the population has a non-nuU proportion of the 
master sequence and the mutation rate is low enough, as time 
passes a distribution of mutant sequences is formed around the 
master sequence. This state is usually called quasispecies [4] . This 
distribution is quite stable even for finite size populations. On the 
contrary, as a consequence of the error-prone self-replication, the 
quasispecies can be destabilized if a higher second fitness peak (e.g. 



another sequence with a larger amplification factor) exists. The 
evolution towards the fittest sequence depends on several factors, 
mainly the mutation rate, the Hamming distance between the two 
peaks, the relative difference between the two fitness peaks and, as 
will be stressed in this paper, on the population size. For finite size 
populations, searching for new genotypes is restricted to a close 
neighborhood of the steady quasispecies. The exploration of the 
far distant sequence landscape is practically unreachable in finite 
time because, as has been said above, only populations of the order 
of 2" have a non-negligible probability of finding a new sequence 
located at a medium Hamming distance (e.g. c/= 10). 

Besides this limitation, finite size effects become apparent when 
competition between independent lineages occurs [12,13]. If we 
consider two lineages formed by error-prone sequences that evolve 
in a two-peaks landscape, each with a different mutation rate, the 
question arises as to which of them wiU survive in the stationary 
state if initially each lineage occupies a fraction of the population. 
As we will see in the Results section, the answer depends on the 
size of the whole population. It is shown that optimal mutation 
rates exist that enhance the probability of survival of a Kneage (and 
so, forming a quasispecies peaked around the fittest phenotype). 
Since having different mutation rates implies different evolution- 
ary times, this result is explained as a consequence of arriving first 
to their fittest sequence. 

In order to compute an evolutionary time in infinite populations 
described in terms of ordinary differential equations (e.g. using the 
molar fraction of each phenotype) the characteristic time has been 
introduced beforehand [14]. Recently, this approximation has 
been used to quantify the dependence of the evolutionary time on 
the mutation rate for different fitness landscapes [15]. We showed 
that, as a consequence of the trade-off between the searching 
capabilities and the fixation probabilities of the master sequences. 
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the characteristic time exhibits a minimum for a positive mutation 
rate lower than the error threshold. We discussed the consequenc- 
es of arriving first for a population of error-prone replicators and 
realized that having a low evolutionary time (i.e. a mutation rate 
close to the optimal value) could have determinant consequences 
in finite size populations. To evaluate the evolutionary time in this 
case we apply a generalization to this characteristic time [16]. 
However, its computation is seriously limited when either the 
population size or the mutation rate are too small. 

The main goal of this paper is to study the evolution of a finite 
size population formed by mutationally isolated lineages of error- 
prone repKcators in a two-peak fitness landscape and the influence 
that some parameters, namely the mutation rate and the 
population size, have on this dynamics. In all cases, competing 
lineages that have the same fitness landscape differ exclusively in 
their mutation rates. Since the final outcome of lineage 
competition will depend on the intra-lineage evolutionary 
processes, we first deal with this internal competition from a 
deterministic perspective in the first subsection of Results. Finite 
size effects in inter-lineage competition is studied computationally 
by means of a reduced model that is presented, justified and 
compared with the complete model in the next subsection in the 
Results. In the subsection ''Lineage competition" this reduced 
model is applied to obtain the prevalence of each Uneage in terms 
of their characteristic time and mutation rates for difierent 
population sizes. It is proven that the percentage of survival of the 
mutator lineages is not obviously dependent on the population 
size, which can be explained I))" die cliaracteristic time of the 
lineages. In the Discussion we summarize our results and make 
some brief comments about their implications for real systems such 
as viruses and bacteria. 

Results 

Intra-lineage competition 

Let us assume first a population of binary sequences of length v 
that forms a unique lineage. An amplification factor that measures 
its propensity to self-replicate is assigned to each sequence (its 
fitness). The model assumes a two-peak fitness landscape, i.e. there 
are three distinct phenotyp(;s: Iq. the sequence whose digits are all 
0, Ii, the sequence whose digits are all 1 and Ig, the error tail, that 
is formed by the rest of the sequences. The amplification factors of 
the master sequences /q and Ii are Aq and Ai, respectively. The 
amplification factor of the error tail is denoted as A^ and verifies: 
Ai > Ao > Ag. A similar fitness landscape with two equal peaks was 
previously apphed in [17] to study the distribution of mutants in a 
degenerated quasispecies. 

Self-replication is error-prone. As usual, q is the quality factor 
per digit, i.e. the probability of exact self-replication of each digit. 
The mutation rate fi per digit is, therefore, )i=\—q. In reference 
to master sequence /q, the sequences that differ in d digits form the 
Hamming class Hi. The mutation matrbc that yields the 
probability that a sequence of Hamming class Hi produces during 
replication a sequence of the Hamming class Hj^ is given by [18]. 

M fk\fv-k\ /\-q\''+'-'-' 

If we assume that every sequence belonging to each Hamming 
class has the same amplification factor and that the total 
population is kept constant, the time evolution of the molar 
fraction of each Hamming class, yhj is described by the ODE 
system: 



^ =yhj{Aj Qjj - Aiyhi) + ^ Qjkyhk (2) 

for 7 = 0, . . . ,v. Here, without loss of generality, a null death rate of 
Di for all sequences has been assumed. Therefore, the selective 
value [4] of each Hamming class is Wk = AtQkk- 

If initially the whole population is considered to be formed only 
by master sequences /q then, as time passes, a first quasispecies is 
obtained around /o until it is displaced by the formation of a 
second quasispecies around the fittest genotype I\. The latter 
quasispecies is asymptotically stable and its structure depends 
mainly on both the mutation rate n and the ratio A\/Ag. 

Throughout the paper we wiU take a sequence length v= 10. 
For this case, the non-Unear ODE system of Eq. (2) has eleven 
differential equations that can be solved numerically using, for 
instance, a Runge-Kutta method implemented in MATLAB. The 
characteristic time for the time evolution of the master copy I\, 
denoted as Tc, is then computed as in [14,15] (see also the section 
Methods). As an example, Fig. 1 depicts the evolution of the molar 
fractions of each Hamming class for the initial value problem with 
j/io(? = 0) = 1 and j'/i,(/ = 0) = 0 for all ; = 1, ... ,10 for a mutation 
rate = 0.025. The amplification factors are taken: Aq=2, 
^1 = 10 and Ag = \. As can be seen in Fig. 1, the population of 
the different Hamming classes appears and disappears successively 
until a stationary state is achieved. This stationary state is formed 
by a distribution of Hamming classes around the fittest sequence 
I\, forming a quasispecies. Two important points are worth 
stressing here. The first one is that the mutation rate determines 
the characteristic time of the formation of this final quasispecies. 
The second one concerns the low concentration that the mutant 
phenotypes have during the evolution from the master phenotype 
/q to the other master Ii . It is precisely the convergence of both 
factors that makes internal fluctuations especially relevant when 
several lineages with different mutation rates compete in finite size 
populations. Indeed, for finite size populations, having a lower 
characteristic time that allows them to reach the fittest phenotype 
first could favor the selection of the lineage with the larger 
mutation rate (contrary to the deterministic prediction) because 
the phenotypes of the other lineage die out. Obviously, in the limit 
of infinite population sizes having a low characteristic time is not 
relevant because, independentiy of the intermediate low concen- 
tration of the mutants the lineage with the lower mutation rate, 
which has a larger selective value, will asymptotically dominate the 
equilibrium population (forming a quasispecies around its fittest 
genotype The influence of the characteristic time on the 
selective properties of independent lineages wiU be explored in 
detail in the following sections. 

However, first at all, we have to overcome a technical problem 
caused by the natural computational limitations. The model 
presented in the previous paragraphs assumes a certain Hamming 
distance between the two master sequences. In the deterministic 
limit, this distance can be covered in a reasonable time since an 
infinite number of sequences are self-replicating and, as a 
consequence of mutation, effectively looking for new genotypes 
in the sequence space. However, when the size of the population is 
finite and much lower than the size of the sequence space 2", the 
searching capabilities of the population are drastically reduced and 
the computational time rises enormously. This fact, in practice, 
prevents the computation of the characteristic time and, conse- 
quently, a complete study of the finite size effects in the evolution 
of this kind of replicator systems. 
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Figure 1. Time evolution of the molar fraction of each of the 
eleven Hamming classes [Ho to Hio) that form the sequence 
space when v = 10. Initially, the whole population is formed by 
sequences /o, i.e. v/io(< = 0) = l. It is assumed that the amplification 
factors of all the sequences that belong to the Hamming classes 
(H2,...,Hq) are equal and are given by Ai, = 1. The amplification factors 
of the master copies that form the Hamming classes Hq and H\o are 
Aq = 2 and Ai = 10, respectively. The mutation rate is /( = 0.025. 
doi:1 0.1 371 /journal.pone.0083 1 42.g001 

A reduced model. To avoid this drawback, we define a 
reduced model that considers all the Hamming classes from Hi to 
Hg as only one class, /,,, that can be taken as an error-tail of both 
master sequences. Essentially, this reduction rescales the evolu- 
tionary time of the population. In so doing, the complete ODE 
system of Eq. (2) of dimension v + 1 is simplified to the following 
tridimensional ODE system: 



dt 
dyre 

dt 
dyr\ 

dt 



yroiAo goo - E, Aiyri)+ Ea.-^o Qokyr/c 

yrMe Qee - E," ^iV^i) + E/t#P Qekyrt (3) 

yr\(Ai Qii - E, Aiyri)+ E^-^i ^/t 2i/t 



where yrQ,yri , and yr^, are the molar fractions of the master copies 
/o, h and the error tail le in the reduced scheme. 

A reasonable choice for the mutation rate of the error-tail, 7^, 
for either of the master copies, /q and Ii , is as a weighted average 
rate over the intermediate Hamming classes, i.e. 



Er 



(4) 



for 7 = 0,1. The combinatorial term takes into account the number 
of sequences that form each of the Hamming classes. Thus, the 
mutation matrix for this model is given by: 



Q- 



/-'(I -9)' 



(1-9)" 



/-'(I -9)' 



(5) 



Fig. 2A and 2B compare the trajectories obtained by numerical 
integration of Eq. (3) and the complete system of Eq. (2) when 
v=10 and Ao = 2, Ae = l and = 10 for two mutation rates 
/i = 0.025 and ^ = 0.0001, respectively. As can be seen, both 
systems behave similarly although over a different time scale. As 
expected, the same stationary state is reached but sooner in the 
reduced system. Fig. 3 depicts the characteristic time Tc as a 
function of the mutation rate for both models for /i-values in the 
interval [10^'',0.3991] taken at steps of 10^^. As before, the values 
of the amplification factors are Aq=2, Ag = \ and Ai = 10. As can 
be appreciated in Fig. 3, the r^-curve of the reduced model is 
qualitatively similar to that of the general model, though it is 
displaced to lower values. The differences between both models 
are more significative for low values of the mutation rate. Note 
that for mutation rates larger than xO.2 the population is in error 
catastrophe and the phenotype with the largest amplification 
factor is no longer selected [19]. In conclusion, at least at a 
qualitative level, the reduced model provides a reasonable 
description of the evolutionary behavior of the population but in 
a much shorter time scale. As will be shown in the next section, 
this reduction is going to allow an exhaustive study for low size 
populations. 

The characteristic time of the time evolution of Ii for different 
values of the amplification factor Ai for the reduced model is 
shown in Fig. 4. The figure depicts Tc for Ai{ = 2, 2.1, 2.2, 5, 10, 
20, and 30) as a function of the mutation rate /je[10^'',0.4] (with a 
^-step equal to 10^^). Since takes different scales as the value of 
Ai approaches that of ^o, the curves for Ai =2,2.1 and 2.2 have 
been included in the inset. As before, Aq=2 and = 1. As it can 
be observed, the curves for large values of Ai axt quahtatively 
similar, all exhibiting a minimum value for approximately the 
same mutation rate j.igp and a relative maximum near the error 
catastrophe (an extended description of this behavior has been 
previously presented in [15]). Note that as the amplification factor 
A\ decreases, the curves move to the left and to higher values of 
Tc. In the limit, when A\ tends to 2 from above, the characteristic 
time increases enormously (several orders of magnitude higher 
than the scale used in Fig. 4) for all values of ji. Moreover, the 
relative maxima disappear in the degenerate case Aq = Ai=2, 
while the characteristic time reduces monotonously with n before 
entering the error catastrophe. The five points in the curves show 
the values that will be analyzed in more detail in the following 
subsections. 

A stochastic simulation. As has already been stressed, die 
size of real populations is much lower than the size of the sequence 
space and, therefore, finite size effects may become relevant. If, in 
addition, competition is present, the deterministic approximation 
that considers infmite size populations does not assure reasonable 
results. Different approaches have been proposed to handle finite 
size populations [20-22]. In general, analytic methods that search 
for explicit solutions have practically been discarded due to the 
system complexity. Instead, computational algorithms have proven 
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Figure 2. Time evolution of the molar fractions of each of the 
three phenotypes (Aa=l, Ac = \ and ^i = 10) for the h-model 
divided into Hamming classes (denoted by the subindex h and 
curves in blue) and the reduced r-model (subindex r and red 
curves). The mutation rates are (A) /( = 0.025 and (B) /( = 0.0001. Note 
that the trajectories of both models are quite similar to the trajectories 
corresponding to the reduced model shifted to the left i.e. to lower 
values of time. 

doi:1 0.1 371 /journal.pone.00831 42.g002 



Figure 3. Characteristic time 7c of the molar fraction of the 
phenotype Ai whose genotype is formed by all 1 as a function 
of the mutation rate for both the h-model (blue) and the r- 
model (red). The mutation rate varies in the interval [10"'',0.3991] in 
constant steps of A/(= 10^'. The amplification factors are Ao = 2, Ae=l 
and Ai = 10. Note that both curves are qualitatively similar to that 
corresponding to the reduced model shifted to lower values of the 
characteristic time. 

doi:10.1371/journal.pone.0083142.g003 

over the la,st 1000 steps and second, we compensate the areas over 
the averaged curve with those below the curve. This is equivalent 
to considering a straight line as the asymptotic value of the 
population [16]. AU the simulations are run long enough to assure 
that the population has reached its asymptotic phase and that its 
average value is approximately that of the stationary state. Table 1 
shows the average values of the characteristic times and the 
standard deviations generated from 200 independent simulations 
for five mutation rates and different population sizes. All the 
simulations have been performed using sequences of length v = 1 0 
and amplification factors: Ao = 2 and Ae = \. As before, initially 
the whole population is composed by master sequences /q. As can 
be seen, the characteristic time is very large for the lowest 
mutation rate, /.i=10^^ in comparison with the rest, mainly 
caused by the high searching time. Large characteristic times also 
appear for the largest mutation rate analyzed, /J = 1 0 ^ ' . However, 
in this case, this is a consequence of the high values of the fixation 
time, i.e. once the fittest phenotype I\ is found, the time the 
population takes to stabilize the quasispecies peaked around I\. 
Between these two extremes, the stochastic Tc exhibits a minimum 
value that occurs, depending on the population size, in = 0.025 
or /i = 0.05 (as indicated by a superscript in the table). 



to be very efficient, although very time-consuming [23]. In this 
paper, we have used a Gillespie's stochastic simulation algorithm 
(SSA) [2 1] to carry out simulations of a finite size population of 
sequences that is described by the reduced model presented in the 
previous subsection Eq. (3). 

To compute the characteristic time of the time evolution of the 
fittest sequence in each simulation first we average its population 



Lineages competition 

Competition is another factor that can enhance finite size effects 
on populations of replicators. We postulate that the sequences of 
each lineage cannot change their mutation rate. This is a 
reasonable assumption when the mutation rate varies on a time 
scale greater than that of the competition [24] . Since lineages are 
independent of each other, the extinction of one lineage is an 
absorbing barrier. As a consequence, the internal noise inherent to 
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Figure 4. Values of the characteristic time for the reduced r- 
model for mutation rates in the range [10 '',0.4] at constant 
steps A^( = 10^^. In the main figure the amplification factor A\ of the 
fittest phenotype is: 5 (blue curve), 10, (green), 20 (red) and 30 (cyan). In 
the inset, Ai takes the values: 2 (blue curve), 2.1 (green curve) and 2.2 
(red curve). In all cases, Ao=2 and = 1. The points in the curves of 
the main picture correspond to the values mutation rate (from left to 
right), /i = 0.001,0.025,0.05,0.075,0.1. These values are applied later in 
stochastic simulations. As expected, increasing the value of the highest 
peak in the sequence landscape, Ai, reduces the characteristic time. 
Furthermore, as depicted in the inset, as Ai approaches Ao the curves 
tend to become monotonous and move up several order of magnitude. 
doi:10.1371/journal.pone.0083142.g004 

finite size populations can completely change the fate of evolution. 
In this section, lineages with different mutation rates compete 
under a constant in average and finite total population constraint. 

Let us first consider a population of two lineages, Lf ^^ and L*^', 
each one formed by three phenotypes Aq\ A!f and J^^ for /= 1,2 
that evolve in a two-peaks landscape. The mutation rates per digit 
of the sequences in each lineage are denoted by /i^'' (/=1,2). As 
before, the amplification factors of the sequences are = 2, 
Ag = \ and ^ i = 1 0 and equal for both lineages. 

One interesting case occurs when one of the lineages is error 
free, i.e. = 0. We want to estimate the probability of fixation of 
the other lineage for diflFerent mutation rates l.P'\ From the 
analysis of the deterministic equations, there exists a critical value 
such that if 0<A('2' < Ai^^* « 0.08764, the lineage L*^) takes 
over the entire population independently of the initial conditions. 
Otherwise, L*'* is selected for all the initial conditions. This result 
is no longer valid for finite size populations as is shown in Fig. 5 . In 
fact, the probability of fixation of lineage L'^' is less than 1, i.e. less 
than 100% of the simulations yield a fixation of L*^' for all /J*^' for 
A'^= 10'. For = 10^, this probability equals 1 for some values of 
iP'^ within an interval contained in [0,^[,^']. Note that, contrary to 
the infinite approximation, when /i'^'— >0 the probability of 
fixation also converges to 0. A smooth transition to null probability 
also appears for ^('^' -values below the deterministic critical value 
H^p . The results depicted in Fig. 5 are obtained from 200 



Table 1. Mean and standard deviation of the characteristic 
time obtained in the stochastic simulations. 





N 


Mutation rate 














2.5x10 ^ 


5x10 ^ 


7.5x10 ^ 


10^ 


5 


10^ 


90 "'"83 


5 1 "'"0 4^^ 


5.6±0.3 


6.8±0.2 


8.8±0.3 




10"^ 


1 


4 9"'"0 I'' 


5.54±0.08 


6.74±0.07 


8.8±0.1 




10' 


8±2 


4.88±a.04° 


5.53±0.02 


6.74±0.03 


8.81 ±0.03 


10 


10= 


80±81 


2.4±0.3° 


2.5±0.2 


3±0.1 


3.6±C.2 




10"^ 


10±7 


2.1 ±0.1° 


2.42 ±0.06 


2.9±0.05 


3.57±0.04 




10' 


5±2 


2.12±0.03° 


2.42±0.02 


2.89±0.01 


3.57±0.01 


20 


10= 


58±58 


1.3±0.3 


1.3 ±0.2° 


1.5±0.1 


1.8±0.1 




10"^ 


9±7 


1.08±0.08° 


1.21 ±0.04 


1.43±0.04 


1.75±0.03 




10' 


3±1 


1.05±0.02'' 


1.2±0.01 


1.42±0.01 


1.74±0.01 


30 


10= 


58±54 


1±0.3 


0.9±0.1° 


1±0.1 


1.2±0.1 




10"^ 


7±6 


0.76±0.08'' 


0.83 ±0.05 


0.98±0.04 


1.19±0.03 




10' 


3±1 


0.71 ±0.02° 


0.82±0.01 


0.97±0.01 


1.18±0.01 



^Lowest values of T^.. 

Mean and standard deviation of T, for the phenotype A\ for different values of 
the mutation rate, population size and amplification factor A[. In all 
simulations, the whole population is initially formed by sequences /q with an 
amplification factor .4o = 2. As before, the amplification factor of the error tail 
is A^, = \. For each experimental setup 200 runs were performed. 
doi:1 0.1 371/journal.pone.0083142.t001 

simulations and an initial population divided into 90 per cent /q'' 
and 10 per cent /q . 

When the mutation rate of the sequences that form both 
lineages is larger than 0, concretely = lO^'* and 

^*^' = 2.5x 10^^, the dynamics become more complex. Fig. 6A 
shows the time evolution of the molar fraction of all the 
phenotypes {Ao,Ae, and Ai) of each lineage in the deterministic 
approximation (obtained by numerical integration of the corre- 
sponding ODE systems Eq. (3)). The total molar fraction of each of 
the lineages is also shown in Fig. 6B. As can be seen, the system 
tends asymptotically to select the lineage with the largest selective 
value that corresponds to that with the lowest mutation rate, i.e. 
L'''. Nevertheless, the characteristic time of lineage L*^' is small 
enough with respect to the corresponding Tc of lineage i*'' to give 
rise to three phases in the dynamics: (i) an increase in the 
proportion of Z,''' in the population, with the symmetric decrease 
of the proportion of L*^*. In this phase, none of the hneages have 
achieved their largest phenotype with yli = 10. But, because the 
selective values for their master sequences with Ao = 2 verify 
Wg^>W^^\ then displaces L^^\ at least momentarily, (ii) 
Since the mutation rate of Z,*^' is much larger than that of L*'', its 
characteristic time is much lower and its corresponding fittest 
sequence is found first. This phenotype self-replicates better than 
the rest and, in consequence, almost displaces lineage L*'' 
although, at this time, it is mostiy formed by sequences /q'' and 
7^''. (iii) Finally, the lineage L*'' finds its best phenotype and, 
because If'{''> PFp', grows to reach its stationary concentration 
and displaces the phenotypes of lineage L*^' that becomes extinct. 
Consequentiy, after this third phase, the whole population is 
formed only by sequences of L^^\ Importantly, the action of 
internal noise in the second phase of the time evolution of the 
lineages is going to be responsible for the disparity between the 
results obtained in the finite and infinite approximations. Finally, it 



PLOS ONE I www.plosone.org 



5 



December 2013 | Volume 8 | Issue 12 | e83142 



The Advantage of Arriving First 



M-l 

O 

c 
o 

■H 
+> 

n) 
X 
■H 



100 



80 



60 



40 



20 



0 If 



L<2> 

N = le+04 B 

N = le+05 

N = le+06 

_a. — B- — -a B-^ — I 

a.. y, : 



0.8 



0.6 y 



0.4 



0.2 



0.0001 0.02 0.04 0.06 0.08 0.1 
Mutation rate (|l=l-q) 

Figure 5. Percentage of fixation of lineage L^^^ formed by error- 
prone self-repiicative sequences against lineage L*'' formed by 
a sequence with a null mutation rate (i.e. /<<')=0) for different 
population sizes A'. 90% of the initial population is formed by 
genotypes /q" of lineage L'" and the rest 10% of genotypes J^^K The 
amplification factors are: ^o" = 2 for and ^!f' = 2, ^[?> = 1 and 

Concretely, the /((^'-values used are: 



A' 



(2) 



J — 5 for L^-^K The mutation rate of the sequences of L*^', il^', ranges 



from IQ-" to IQ- 
0.0001,0.0125,0.025,0.0375,0.05,0.0625,0.075,0.0875 and 0.1. The pop- 
ulation sizes that correspond to each curve are: A'^10'' (blue line), 
A' = lO' (green) and N = 10^ (red). The right vertical axis represents the 
deterministic molar fraction. The violet curve represents the equilibrium 
molar fraction obtained by numerical integration of the ODE system for 
values of /(•-' at constant steps of 10^''. Note that in the deterministic 
limit of infinite population an abrupt transition occurs at a mutation 
rate of /i*-' s: 0.08764. As it can be seen, this transition occurs gradually 
for finite size populations. 
doi:1 0.1 371 /journal.pone.00831 42.g005 

is worth mentioning that the average selective value of the 
population W approaches asymptotically its maximum value 



Pt^J'* = 10(1 — 10^'')"', although it can decrease momentarily in 
some of the phases of evolution (e.g. during the first phase, in the 
case depicted in Fig. 6C). 

Fig. 7 shows how the probability of fixation of lineage L*^* varies 
with the population size and the amplification factor of the fittest 
phenotype Ai. In all these experiments the mutation rate of 
lineage Z,''' is /j''' = 10^*. The rest of the parameters are kept as 
before, i.e. Aq=2, Ag = \. Initially, the two lineages are equally 
represented in the whole population and they are formed 
exclusively by phenotypes Aq. Each experiment has been repeated 
1000 times for low population sizes and 200 times for larger ones. 
Fig. 7 A shows the percentage of frxation of L*^' when Ai=5. In 
this case, larger values of the mutation rate yield lower 
probabilities of fixation of L^^\ In Fig. 7B, when A\ = 10, the 
results are not so evident, although seems to be opposite, i.e larger 
mutation rates give rise to larger probabilities of frxation of 
L'^'. In the other two figures, 7C and 7D, the situation is clearly 
established. Besides, the probability of fixation of L*^' for all 



population sizes and all mutation rates shows a monotonous 
dependence on ^i. Note that, in contrast to the deterministic 
description, all curves converge to a 100% fixation for high 
population sizes. The exception is the case Ai=5 for the mutation 
rate /.(*^' = 0.I where the probability of fixation of lineage L*^' 
remains nuU for all population sizes. This result can be explained 
by the closeness of this mutation rate to the error threshold. 

To further investigate how the probability of fixation depends 
on the initial conditions and the value of the highest peak (largest 
amplification factor) Ai, we carry out a serial of experiments 
whose results are summarized in Fig. 8. In all the simulations the 
mutation rates for both lineages are frxed: ^*'* = 10^'' and 
^P)_Q ()25 for L^'' and L^^\ respectively. In the deterministic 
approach, since the selective value of the fittest phenotype of each 
lineage verifies IT*" =^i(l - 10-")'° > W^p' = ^i(I -0.025)'° 
then, the lineage Z.''' is selected. As can be seen, this is not the 
case when the size of the population is not high enough to reach 
the deterministic limit. In fact, even for very large populations sizes 
(i.e. iVe[10^10']), the fLxation of L*^) reaches 100% of the 
simulations. For A^e[10'',10'] a non-nuU probability of fixation of 
L'^' exists that, for a given population size and initial conditions, 
tends to increase with the amplification factor A i . However, for 
smaller population sizes the internal noise is so high that the 
fixation of L'^' is very low. Note that, even when the initial 

(2) 

condition of /g is low a high probability of frxation still exists for 
population sizes in the interval [10'', 10^]. 

In summary, all these results confirm that for population sizes 
which are high, but not high enough to reach the deterministic 
limit, the lineage with the largest value of the mutation rate (the 
mutator lineage) can take over the whole population. This is a 
consequence of arriving first to the fittest phenotype which, by 
natural selection, displaces the less fit sequences of the other 
lineage. The important fact is that, a priori, the fittest phenotype, 
that belongs to the low mutator lineage, is never reached. The 
question arises as to whether an optimal mutation rate exists that, 
for a given population size, optimizes the probability of fixation. 
This question is addressed next by studying the time evolution of a 
finite population formed by five lineages with different mutation 
rates. 

As in the previous simulations, an initial population divided 
equally among five lineages with different mutation rates evolves 
over time until the stationary state, i.e. the selection of one of the 
lineages, is achieved. Initially, only sequences /q' exist. In all 
lineages the amplification factor of the error tail is Ae = \. The 
mutation rates of each lineage take the values already highlighted 



Fie. 



„(2) = 



concretely: ' = 10 
^w; = 3x lU-', = 7.5 X 10-2 /i(5) = l0-i. The amplifica- 
tion factors of the fittest sequence in each lineage are equal and is 
varied in the simulations (see Table 2). The population size ranges 
from 10^ to 10^. As before, the results shown in Table 2 are 
obtained from 1000 simulations for Af= 100,1000 and 200 
simulations for larger population sizes. As can be seen in this 
table, for all values of A\ and population sizes in the interval 
[10^,10^] the lineage L'^) selected. Importantly, this lineage has 
the mutation rate that yields the lowest characteristic time of its 
fittest sequence, i.e. ^1 = 2.5 x 10-^. For = 10' and amplification 
factor ^1=30, the lineage L*^* presents similar percentages of 
selection. On the contrary, for low population sizes (10^ and 10^) 
and values of the amplification factor A\ =5,10,20, the lineage 
L''' with the lower mutation rate and a large selective value is 
mostly selected. For the intermediate size = 1 0" none of the 
lineages have a clear selective advantage, which is likely due to the 
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Figure 6. Lineage competition in the deterministic limit 
obtained by numerical integration of the corresponding ODE 
system. The mutation rates of the two lineages i*'' and L*^* are 
^i'" = 0.0001 and /i*^* = 0.025, respectively. As before, the amplification 



factors of each phenotype in both lineages are: A'g^ =A'g' = 2, 



4" = 4^' = 1 and A' 



(1). 



10. The whole population is initially 



formed by genotypes Ig, shared equally in both lineages. Figure (A) 



depicts the time evolution of each of the three phenotypes that form 
each lineage. Solid curves correspond to phenotypes of L"', whereas 
dashed lines are for phenotypes of L*^'. In figure (B) the three 
phenotypes are aggregated to yield the molar fraction of each lineage, 
L*'' (solid blue line) and L'-' (dashed green line). In (C) the time 
evolution of the average fitness of the population {IV) is shown. The 
three phases that appear in the temporal evolution of the phenotypes 
and molar fractions of the lineages are separated by vertical black lines 
(see main text for more details). 
doi:10.1371/journal.pone.0083142.g006 



fact that the selection pressure and time evolution are almost 
compensated. In any case, these results suggest that, at least for an 
intermediate range of the population sizes [10',10^], the mutation 
rate that provides the minimum Tc is highly correlated with the 
probability of survival under a selective pressure. 

Discussion 

The validity of the infinite size approximation, i.e. the 
deterministic approach, depends very much on the problem 
under study. For instance, finite size effects do not seem to be 
particularly relevant for the formation of a quasispecies around a 
wild type phenotype [25]. In contrast, this limit is scarcely valid for 
quantifying the evolutionary time of populations of sequences or 
lineages of sequences. It turns out that for small population sizes 
evolution can be practically impeded due to the huge increase of 
the searching time, i.e. the time needed to find a better phenotype. 
This fact can change the fate of evolution as deduced from the 
deterministic description. In this paper, we have explored the 
evolutionary time in fmite populations using a simple model of 
quasispecies lineages that evolve in a two-peak landscape. 
Importandy, the fmite size effects are so drastic that the 
deterministic limit cannot be applied to predict the evolution 
outcome. The question of how large the population must be to 
assure the deterministic limit is, in our opinion, of great interest. 
Nonetheless, as has been pointed out above, it depends on the 
intrinsic characteristics of the problem, in particular, on the fitness 
landscape, the mutation rates and initial conditions. Furthermore, 
there is no clear way of determining the dependence of the internal 
fluctuations on these factors, and its exploration using computer 
simulations is almost impossible due to the large size of the system. 

The mutation rate is an essential parameter to determine the 
time taken to reach the fittest sequence from the master one and to 
stabilize. In the deterministic limit, this time can be estimated by 
the characteristic time [14]. An analog to this time can be used to 
estimate an evolutionary time in fmite size populations [16]. The 
question arises as to what extent the characteristic time associated 
to a lineage (e.g. that of its fittest phenotype) is responsible for its 
survival. In other words, whether lineages with low characteristic 
time have a larger probability of being selected by natural selection 
in a finite population. It must be stressed at this point that, in the 
deterministic limit and above the error threshold, the only factor 
that determines the final outcome of the evolutionary process is the 
selective value of the phenotypes, independently of their charac- 
teristic time. It turns out that, as Fig. 2B and Fig. 6B depict, the 
error tail concentration is very low in the transition from the 
master sequence to the fittest one, and then internal noise caused 
by the finite size of the population becomes relevant. The major 
consequence is that the fate of evolution, as predicted by the 
deterministic model, can be drastically modified when dealing with 
finite size populations. As presented in the Results, this disparity is 
especially important when independent linages are competing in a 
constrained finite population. This has already been obtained in 
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Figure 7. Percentage of fixation of lineage L<^^' as a function of tKie population size in the competition against the other lineage L''' 
for different values of the mutation rate of the sequences that form L*^' and for different values of the amplification factor ^i: (A) 

A\^* = Af* = 5; (B) 4''='4f* = 10; (C) 4'* = ^f* = 20 and (D) 4"=4'' = 30. in all cases, the values of the other amplification factors are 
A^fl' ^Af ' = 2 and = 1 and the mutation rate of L*" is = lO^''. As before, the initial population is divided equally into genotypes /o of 

both lineages. The values of ;('^' used are: 0.1 (blue lines), 0.075 (green lines), 0.05 (red lines) and 0.025 (cyan lines). The population sizes simulated 
are: 10^, 10', lO'', 10^, 10^ and 10'. For low populations sizes 1000 runs were carried out for each experimental setup, whereas for A'>10'' two 
hundred runs were enough to have negligible statistical errors. 
doi:1 0.1 371/journal.pone.00831 42.g007 



[12,13] for a system formed by two independent populations 
(lineages) of error-prone replicators in a different fitness landscape. 
Here, we have interpreted this stochastic outcome from the 
characteristic time of the lineage (measured from the characteristic 
time of their fittest phenotypes). The lineage that arrives first to its 
fittest phenotype is able to displace the best adapted one, as 



deduced from a deterministic approach, only in the case of finite 
size populations. 

As has been stated above, the assumption that lineages are 
mutationally isolated is valid when sequence evolution occurs 
without a significative modification of their mutation rate. This 
hypothesis allows a complete computational treatment of the 
population dynamics even when it is formed by five lineages. 
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Figure 8. Percentage of fixation of lineage L<^' in terms of the population size in the competition against L''' for different initial 
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population is formed by /q'*. Each graph considers a different value of the amplification factors of the fittest phenotypes /i. Concretely: (A) 



A' 



(1). 



5; (B) A 



(!)_ 



10; (C) A'j>=Af^ = 20 and (D) A^j'=A^{" = 30. The rest of the amplification factors are: A\l 



(1). 



^0 - 



^2 and = 



and the mutation rates of L*'' and L*^' are = lO^'* and /Z^' = 0.025, respectively. As in the previous figure, for each experimental setup 1000 runs 
were performed for A' = 10-,10' and 200 runs for larger populations sizes. 
doi:1 0.1 371 /journal.pone.00831 42.g008 



When longer time scales are considered, lineages can then be 
mutationally connected and hence, the mutation rate can evolve. 
For instance, it could be assumed that some digits of the sequence 
(a locus) codify the mutation rate of the whole sequence. Under this 
assumption, a mutator phenotype could be fixed due to its better 
selective value, but later the population would evolve again 
towards a phenotype with a lower mutation rate. It is likely that a 
phenomenon of transient "switch" in the mutation rate would 



occur, similar to that described by [26]. However, this dynamics 
could be radically different if more complex fitness landscapes, 
that might be not only rugged but also dynamic, i.e. that change 
over time, are taken into account. Obviously, this raises the 
question as to what extent the results obtained from these models 
are applicable to real fitness landscape, e.g that of viruses. Recent 
papers have provided new experimental data and confirm that 
they are in general more rugged, as expected, and with a high level 



PLOS ONE I www.plosone.org 



9 



December 2013 | Volume 8 | Issue 12 | e83142 



The Advantage of Arriving First 



Table 2. Percentage of fixation of each lineage during the 
stochastic competition of five lineages. 
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Percentage of fixation of each of the five lineages with different mutation rates 
in terms of the population size (A^} and the amplification factor of the fittest 
phenotype {^i). As before, Ao = 2 and Ai. = 1 in all the lineages. For each 
experimental setup 1000 runs were performed for A^g[10-,10-^], and 200 runs for 

N>10*. 

doi:l 0.1 371/journal.pone.0083142.t002 

of neutrality (see [27] and references therein). Neutrality is already 
present in the two peak landscape and, indeed, causes the drastic 
rise of the characteristic time to achieve the fittest peak observed 
for finite size populations. In addition, this fitness degeneracy is 
partly responsible for the strong discrepancy with the deterministic 
outcome. The effect of ruggedness on the characteristic time has 
already been studied in a previous paper [15]. In that paper we 
studied the characteristic time of a population of replicators in 
more rugged landscapes, namely multiplicative with two peaks, 
binary rugged and Kauffman-NK landscapes, and showed that it 
presents a similar dependence on the mutation rate. Therefore, 
although real fitness landscapes are essentially more complex and 
differ globally from the double peak landscape model assumed in 
our study, the results derived under this assumption are of great 
interest from a local perspective, i.e a dynamics restricted to 
successive moves from one master phenotype to another master in 
its neighborhood in a relatively short time scale. 

The matter of whether the high value of the mutation rates of 
viruses results from natural selection is still under debate. Two 
main explanatory lines have been stated. On the one hand, natural 
selection would foster high mutation rates because they confer a 



large adaptability to environmental changes. A lower bound to the 
mutation rate would appear to maintain the information of the 
quasispecies, the error threshold. So, according to this line of 
reasoning, an optimal mutation rate will exist placed just above 
this error threshold [28]. On the other hand, an alternative 
explanation comes from the evidence that evolution is only acting 
instantaneously and on finite size populations. Therefore, natural 
selection would not be able to cross extended valleys between 
fitness peaks, mainly caused by the accumulation of deleterious 
mutations [29] . From this perspective, a high mutation rate would 
appear as a side effect of selection for high rephcative rates [30,31] 
(a classical example of "selection for" instead of "selection of 
[32]).^ 

This paper has shown that both explanations are not mutually 
contradictory. On the contrary, they explain two different 
manifestations of natural selection acting on populations with 
dilferent sizes. The fixation of the mutation rates during evolution 
depends strongly on the population size and it is highly likely that 
the direction of adaptation might have changed repeatedly 
according to the selective pressures that operate at each moment. 
When the population size is low, the characteristic time is high and 
the deleterious effect of the mutation rate causes the disappearance 
of the mutator lineage. For medium size populations, the adaptive 
capacity of the mutator lineage, reflected in its ability to arrive first 
to its fittest phenotypes, overcomes the deleterious effects and 
allows its selection over the entire population. For even larger 
population sizes, in the deterministic limit, the greater adaptability 
of the mutator lineage is no longer enough to displace the non- 
mutator lineage, as this has similar potential to achieving its fittest 
phenotype before disappearing, and then becoming fixed (to the 
detriment of the mutator lineage). 

A question that immediately arises from this discussion is 
whether the selection of mutator lineages is a consequence of a 
hitchhiking phenomenon, i.e. the selection of mutator alleles 
because they are linked to other advantageous alleles that are 
effectively selected by natural selection [24,33-37]. In light of our 
results, these mutator alleles are selected because of their selective 
advantage provided by a shorter evolutionary time. For interme- 
diate population sizes, a phenotype that belongs to a lineage that 
has the shortest evolutionary time enhances its probability of being 
selected. This phenotype gets an advantage not only by lifting a 
lineage but by riding die fastest one. 

Methods 

In some cases it is reasonable to describe the time evolution of a 
population formed by several phenotypes in terms of continuos 
variables, such as molar fractions. In the homogeneous case, the 
dynamics of each variable is usually described in terms of 
Ordinary Differential Equations. If the system exhibits an 
asymptotic behavior, all molar fractions approach their equilibri- 
um values and, by definition, the time they take to achieve this 
state is infinite. However, this mathematical information has low 
value in many practical problems where a stationary regime is 
approximately reached in a finite time. Many different methods 
have been suggested to get an estimation of the scale of this 
intrinsic system dynamics. We recently presented in [15] the 
characteristic time of a continuos variable as a way of overcoming 
important deficiencies in previous definitions, particularly for non- 
linear systems, by taking into account the whole path from the 
initial condition to the final state of a given trajectory. As discussed 
in that paper, this characteristic time can be interpreted as: (i) 
specifically for linear system, as a weighted average of the inverse 
of the system eigenvalues, (ii) the hypothetical time at which the 
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whole transition occurs and (iii) the mean time of the transition. 
More recently, this concept has also been applied to estimate the 
characteristic time of stochastic population systems where internal 
noise is considered [16]. 

The ODE systems that describe the time evolution of the 
population in the deterministic approximation that assumes an 
infinite size Eq. (3) are solved numerically using a Runge-Kutta 
scheme implemented under the standard software MATLAB. 
Numerical integration is stopped when the molar fraction of the 
fittest phenotype A\ at successive steps differs less than 10^** 
during at least 200 consecutive steps. It is assumed that initially 
only phenotypes Aq exist in the population. When more than one 
lineages is present, the same proportion of phenotypes Aq of each 
lineages is initially supposed. The trajectories of the phenotype 
with the largest amplification factor are then used to compute the 
characteristic time as described in Llorens et al. [14]. To be precise, 
if y{t) is a monotonous trajectory of the dynamical system with 
initial condition y{0) and equilibrium point then its characteristic 
time reads: 



poo dy 



(6) 



For finite size populations we have used a well-known stochastic 
approach, the so-called Gillespie's algorithm, to simulate the time 
evolution of the number of sequences of the possible phenotypes 
that can be formed in the system. The Gillespie's algorithm 
provides an exact simulation of the time evolution of the number 
of genomes of different phenotypes in a finite population [2 1] . The 
algorithm was implemented in C. To generate pseudorandom 
numbers we apply the Mersenne twister method [38] . To compute 
the characteristic time of the fittest sequence, the program controls 
the asymptotic phase of the simulations and determines the first 



time the population of phenotypes A\ becomes larger than that of 
phenotypes Aq, that is denoted as t„^ >„„. Here n\ and «o represent 
the number of genomes with phenotypes A\ and Aq, respectively. 
The final time of each simulation is taken as: 



(7) 



In so doing, we are assuring that the number of genomes of 
phenotype A\ is already in its asymptotic phase and then, its mean 
value is close to its steady state. A stochastic characteristic time is 
computed in a MATLAB framework according to the procedure 
previously described in [16]. Essentially, we approximate the 
stochastic realization by a monotonously increasing curve that 
converges to the mean value of the last 1000 values of «i. This 
resulting continuous curve is then used to estimate the character- 
istic time of a single simulation by means of the formula Eq. (6). 
Finally, the average characteristic time is computed from 200 
simulations. In addition, the standard deviation is also determined 
for each experimental setup. In the experiments that involve more 
than one lineage, a checking step for lineage disappearance is 
included in the program. In the case of two lineages, the 
simulation is stopped when aU phenotypes that form a lineage 
have died off When five lineages are competing, the simulation 
ends when four lineages disappear. 
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